Suppose we are given the following dataset around monthly revenue and expenses:
There are many ways to visualize this data. For example, we could use the columns Month, Revenue, and Expense to summarize the expenses and revenue by month, and plot the relationship between expenses and revenue:
Is there any useful information we can glean from looking at the above chart?
Answer:
We notice that there is a fairly strong linear relationship between revenue and expenses. If we were to fit a regression line to this data (blue dotted line in the chart below), we would see that there are two points (March and August) that represent the furthest distance from either side of the dotted line. In other words, they are the two cases in the linear relationship between revenue and expenses that fit the relationship the most poorly. As such, we might want to ask why expenses were so low in relation to revenue in August, and why expenses were so high in relation to revenue in March.
We can also create another scatterplot using the columns Fund, Revenue, and Expense. Again we’ll look at the relationship between revenue and expenses, but this time we’ll color the data points by the type of fund they belong to:
What kinds of additional new insights can we gain from the data by visualizing it via the above chart?
Answer:
We can see that the funds behave differently from each other when looking at the relationship between revenue and expenses. In particular, Fund A behaves quite differently than Funds B & C. As revenue increases (moving from left to right on the x-axis), expenses increase much faster in Fund A than they do in Funds B and C. It appears that there is still a positive linear relationship present between revenue and expenses in Fund B, such that expenses increase as revnue increases (but at a slower rate than Fund A). Fund C, however, does not appear to have much of a relationship between revenue and expenses (an increase in revenue does not necessarily imply an increase or decrease in expenses).
Next, let’s create a bar chart using the columns Fund, Revenue, and Expense, where the bars represent the total revenue and expenses for the year, and the chart is grouped by the different funds:
What new questions might we have after visualizing the data in the above chart?
Answer:
Looking at Funds B & C, we can see that Fund C had slightly higher revenue than Fund B. This was met with slightly higher expenses in Fund C than in Fund B. Conversely, Fund A had lower revenue than both Funds B & C, yet it had the highest amount of expenses. From an audit perspective, we may want to ask why the ratio between revenue and expenses in Fund A behaves contrary to the ratio between revenue and expenses in Funds B & C.
Lastly, we can visualize the data as a time series, by plotting lines representing the revenue and expenses by date, and breaking the data into three charts representing the different funds:
What final questions might we have after visualizing the data in the above three charts?
Answer:
We can see that the relationship over time between revenue and expenses is highly correlated in Fund A (increases in revenue trends are met with increase in expense trends, and the same can be said for decreases; if you pushed the red expense line upwards in the chart, it would almost trace over the revenue line). This same relationship over time between revenue and expenses is weakly correlated in Fund B, and no correlation appears to be present between revenue and expenses over time in Fund C. Also, in Funds A & B, there appear to be peaks in the expenses trend in months 3, 6, and 9 (March, June, and September), while these months represent valleys in the expenses trend for Fund C during these months. This might lead us to ask why these months represent higher expense periods in Funds A & B, and why this contrasts with the lower expense periods during these months in Fund C.